In this exercise, we will be using functions from the
tidyverse package. You can see we’ve added the chunk option
message = FALSE to hide the version information that
tidyverse normally displays.
library(tidyverse)
The file
icecore.csvcontains CO2 concentration measurements made in ice cores from Antarctica over time, where time is defined as the age of the air in years before 2008. That is, an air age of 0 would mean 2008, while an air age of -10008 would be 8000 BC.Read it into a data frame called
icecore.Hints:
- You’ll need to insert a code chunk and then add code inside that chunk.
- Check back with Exercise 1.2 if you can’t remember how to read a CSV file.
- You can hide the list of columns by changing
{r}at the start of your code chunk to{r, message = FALSE}.
icecore <- read_csv("icecore.csv")
Use ggplot to make a simple scatter plot of the
air_age_ADandCO2_ppmvariables in theicecoredata frame. (Think: which of these is more appropriate for the x axis and which is more appropriate for the y axis?)Hint: you will need to specify the data frame to plot, provide mappings from columns to aesthetic attributes, and add a geom layer.
Look at the axis labels. Is there anything about them you don’t understand or don’t like?
ggplot(icecore, aes(x = air_age_AD, y = CO2_ppm)) +
geom_point()
The x axis is labelled in scientific notation; -4e+05 means -40,000. We’ll show you tomorrow how to change this into something more readable.
The
icecoredataset includes samples from multiple ice cores, which are recorded in thecorevariable. Use an appropriate aesthetic (e.g. colour or shape) or display.
ggplot(icecore, aes(x = air_age_AD, y = CO2_ppm, colour = core)) +
geom_point()
Make a copy of your code above and change it so it produces a line graph. Try making a line graph with and without points.
ggplot(icecore, aes(x = air_age_AD, y = CO2_ppm, colour = core)) +
geom_point() +
geom_line()
Make a histogram of the
CO2_ppmvariable.Choose an appropriate width for the bins (
binwidth = XXX) and make sure the bins line up on a round number (boundary = XXX).
ggplot(icecore, aes(x = CO2_ppm)) +
geom_histogram(binwidth = 10, boundary = 200)
Copy and paste your code from the last question, then use the
facet_wrap()function to facet the data bycore.What can you say about the distribution of CO2 concentrations? Is it the same in every core?
ggplot(icecore, aes(x = CO2_ppm)) +
geom_histogram(binwidth = 10, boundary = 0) +
facet_wrap(vars(core), ncol = 2)
Make a box plot showing the distribution of CO2 concentration by core.
Decide whether the box plots should be vertical and one horizontal.
Experiment adding the
width = 0.2option insidegeom_boxplot().
ggplot(icecore, aes(x = CO2_ppm, y = core)) +
geom_boxplot(width = 0.2)
The file
afl_grand_finals.csvcontains information on every Australian Football League (AFL) grand final played, from 1898 to 2019.Read it into a data frame called
afl_grand_finalsand make a bar chart of the variablewinner. (Think: should the winning team be displayed on the x or the y axis?)What do you make of the team “NA”? (What happens when an AFL grand final ends in a draw?)
What order are the teams shown in?
afl_grand_finals <- read_csv("afl_grand_finals.csv")
ggplot(afl_grand_finals, aes(y = winner)) +
geom_bar()
The “NA” team represents a grand final ending in draw, which has happened three times in the history of the AFL. When this happens, the match is replayed the following week. This practice was controversial and AFL abolished grand final replays in 2016.
The teams are plotted in reverse alphabetical order, with “Adelaide” at the bottom and “Western Bulldogs” near the top. “NA” is a special value indicating missing data, and is sorted after all others.
We will cover how to remove data with missing values in Exercise 2.2.
Conventionally, a scatter plot is used to display the relationship between two continuous variables. Sometimes it’s also appropriate to plot points where one axis shows a categorical variable.
Make a scatter plot with
yearon the x axis andwinneron the y axis.Add a line connecting the points.
How do you think ggplot know which points to join with lines? (There is actually a complicated set of rules in place, which can be overridden if needed.)
If you’re starting to feel comfortable with ggplot, try using the code on the slides to reverse the order of categories.
ggplot(afl_grand_finals, aes(x = year, y = winner)) +
geom_point() +
geom_line() +
scale_y_discrete(limits = rev)
If you’ve finished early, go back to the plots you’ve made and make sure they all have appropriate axis labels, using the
labs()function demonstrated in the lecture slides.By default, ggplot uses variable names for axis labels. This is very helpful when you’re making plots for your own consumption, but if you want to share your graphics with others, you would usually want to provide something more descriptive.
© 2021 Statistical Consulting Centre, The University of Melbourne.